-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inbound: Improve policy metrics #1237
Conversation
Now the `name` label is required on both servers and authorizations. Other labels are ignored in metrics.
We recently introduced metrics to help surface inbound policy decisions, but in practice these haven't been as useful as we might hope. Specifically, error metrics do not include the `target_addr` label so these metrics can't be correlated with servers, etc. This change improves error metrics and also introduces new metrics to describe authorization decisions: authorization denials shouldn't be classified as errors, really, anyway. This change also improves TCP forwarding authorization so that policy changes can be honored at runtime: previously authorized connections may dropped if the policy is updated so that the connection is no longer authorized. The gateway is also updated to enforce HTTP policies at runtime as well so that policy changes can be honored after the connection has been established. This change introduces new metrics: * `inbound_http_authz_allow_total` * `inbound_http_authz_deny_total` * `inbound_tcp_authz_allow_total` * `inbound_tcp_authz_deny_total` * `inbound_tcp_authz_terminate_total` _allow_ metrics include `target_addr`, `srv_name`, and `saz_name` labels. _deny_ and _terminate_ metics include only `target_addr` and `srv_name` labels. Authorization denials are no longer reflected in inbound_tcp_error or inbound_http_error metrics. A number of internal changes have been made to support this: * The `inbound::policy::authorize` module includes middlewares for TCP and HTTP authorization, replacing the prior method of enforcing policy in the stack/router. This module ensures that metrics are recorded for policy decisions. * The `error-metrics` crate has been removed. In its place a `monitor` type has been added to the `stack` crate, supporting a general way to observe errors, decoupled from the metrics registry. * Inbound and outbound error metrics are now tracked in the inbound and outbound crates, respectively. Inbound- and outbound-specific error types are also moved into their rspective crates. * The `app_core::errors` module has been updated to only define the types it needs to instrument the error response layer. Error responses are now primarily instrumented via the `HttpError` type so that errors that should be handled can be configured where the error is thrown. The error type now holds an underlying source error so that the error metrics layer can see through this wrapper type to track the underlying error cause. * Server & Authorization labels are no longer handled as a free-form maps. We currently read only the `name` label from each; and this label is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, this looks good to me, i had mostly minor comments. i do think we may want to just replace the RwLock
s with Mutex
es, because it seems like we're not really getting any benefit from them --- we never have concurrent reads, and it might just be adding overhead.
// should not be calling poll_data if we have set trailers derived from an error | ||
assert!(trailers.is_none()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/TIOLI: should we maybe just move the comment into the panic message for the assert?
@@ -355,6 +345,7 @@ impl tap::Inspect for Logical { | |||
} | |||
|
|||
fn dst_labels<B>(&self, _: &http::Request<B>) -> Option<&tap::Labels> { | |||
// TODO include policy labels here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be addressed as part of this PR, or is it a follow-up change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a followup change already up
This release features some performance improvements: tokio has been updated to pick up tokio-rs/tokio#4055, and link-time optimizations have been enabled in release builds. These changes reduce CPU and memory overhead in benchmarks. Inbound policy enforcement has been updated so that TCP forwarding is interrupted if a policy update revokes a previously-established authorization. New metrics are exposed to reflect how policies are used by the proxy: `inbound_http_authz_{allow,deny}_total` and `inbound_tcp_authz_{allow,deny,terminate}_total`. The proxy's error metrics, `{inbound,outbound}_{http,tcp}_errors_total`, have been updated to include the traffic target. And the `traffic_addr` metric label is augmented by `target_ip` and `target_port` labels to support more flexible prometheus queries. Inbound TCP metrics now only include a `srv_name` label, as it can't be expected for all inbound connections to include authorization labels (hence the new authz metrics). However, all inbound HTTP metrics--except for the HTTP errors metric, which includes only a `srv_name` label--include both `srv_name` and `saz_name` label. Finally, the inbound and outbound proxies now only exports Route-oriented metrics when a ServiceProfile is enabled, preventing redundant metrics from being exported with no differentiating labels. --- * profiles: Avoid creating a default route stack (linkerd/linkerd2-proxy#1223) * build(deps): bump arbitrary from 1.0.1 to 1.0.2 (linkerd/linkerd2-proxy#1224) * build(deps): bump trust-dns-resolver from `f08860c` to `3d0667a` (linkerd/linkerd2-proxy#1225) * build(deps): bump libc from 0.2.100 to 0.2.101 (linkerd/linkerd2-proxy#1226) * Enable link-time optimizations (linkerd/linkerd2-proxy#1227) * build(deps): bump serde_json from 1.0.66 to 1.0.67 (linkerd/linkerd2-proxy#1228) * build(deps): bump flate2 from 1.0.20 to 1.0.21 (linkerd/linkerd2-proxy#1230) * build(deps): bump thiserror from 1.0.26 to 1.0.28 (linkerd/linkerd2-proxy#1231) * build(deps): bump futures from 0.3.16 to 0.3.17 (linkerd/linkerd2-proxy#1232) * build(deps): bump parking_lot from 0.11.1 to 0.11.2 (linkerd/linkerd2-proxy#1234) * build(deps): bump trust-dns-resolver from `3d0667a` to `v0.21.0-alpha.2` (linkerd/linkerd2-proxy#1233) * Rename push_on_response to push_on_service (linkerd/linkerd2-proxy#1235) * build(deps): bump tokio from 1.10.1 to 1.11.0 (linkerd/linkerd2-proxy#1236) * metrics: Add `target_ip` and `target_port` labels (linkerd/linkerd2-proxy#1238) * inbound: Improve policy metrics (linkerd/linkerd2-proxy#1237) * inbound: Include server labels in tap responses (linkerd/linkerd2-proxy#1239) * Revert rustc update for release builds
This release features some performance improvements: tokio has been updated to pick up tokio-rs/tokio#4055, and link-time optimizations have been enabled in release builds. These changes reduce CPU and memory overhead in benchmarks. Inbound policy enforcement has been updated so that TCP forwarding is interrupted if a policy update revokes a previously-established authorization. New metrics are exposed to reflect how policies are used by the proxy: `inbound_http_authz_{allow,deny}_total` and `inbound_tcp_authz_{allow,deny,terminate}_total`. The proxy's error metrics, `{inbound,outbound}_{http,tcp}_errors_total`, have been updated to include the traffic target. And the `traffic_addr` metric label is augmented by `target_ip` and `target_port` labels to support more flexible prometheus queries. Inbound TCP metrics now only include a `srv_name` label, as it can't be expected for all inbound connections to include authorization labels (hence the new authz metrics). However, all inbound HTTP metrics--except for the HTTP errors metric, which includes only a `srv_name` label--include both `srv_name` and `saz_name` label. Finally, the inbound and outbound proxies now only exports Route-oriented metrics when a ServiceProfile is enabled, preventing redundant metrics from being exported with no differentiating labels. --- * profiles: Avoid creating a default route stack (linkerd/linkerd2-proxy#1223) * build(deps): bump arbitrary from 1.0.1 to 1.0.2 (linkerd/linkerd2-proxy#1224) * build(deps): bump trust-dns-resolver from `f08860c` to `3d0667a` (linkerd/linkerd2-proxy#1225) * build(deps): bump libc from 0.2.100 to 0.2.101 (linkerd/linkerd2-proxy#1226) * Enable link-time optimizations (linkerd/linkerd2-proxy#1227) * build(deps): bump serde_json from 1.0.66 to 1.0.67 (linkerd/linkerd2-proxy#1228) * build(deps): bump flate2 from 1.0.20 to 1.0.21 (linkerd/linkerd2-proxy#1230) * build(deps): bump thiserror from 1.0.26 to 1.0.28 (linkerd/linkerd2-proxy#1231) * build(deps): bump futures from 0.3.16 to 0.3.17 (linkerd/linkerd2-proxy#1232) * build(deps): bump parking_lot from 0.11.1 to 0.11.2 (linkerd/linkerd2-proxy#1234) * build(deps): bump trust-dns-resolver from `3d0667a` to `v0.21.0-alpha.2` (linkerd/linkerd2-proxy#1233) * Rename push_on_response to push_on_service (linkerd/linkerd2-proxy#1235) * build(deps): bump tokio from 1.10.1 to 1.11.0 (linkerd/linkerd2-proxy#1236) * metrics: Add `target_ip` and `target_port` labels (linkerd/linkerd2-proxy#1238) * inbound: Improve policy metrics (linkerd/linkerd2-proxy#1237) * inbound: Include server labels in tap responses (linkerd/linkerd2-proxy#1239) * Revert rustc update for release builds
We recently introduced metrics to help surface inbound policy decisions,
but in practice these haven't been as useful as we might hope.
Specifically, error metrics do not include the
target_addr
label sothese metrics can't be correlated with servers, etc. This change
improves error metrics and also introduces new metrics to describe
authorization decisions: authorization denials shouldn't be classified
as errors, really, anyway.
This change also improves TCP forwarding authorization so that policy
changes can be honored at runtime: previously authorized connections may
dropped if the policy is updated so that the connection is no longer
authorized.
The gateway is also updated to enforce HTTP policies at runtime as well
so that policy changes can be honored after the connection has been
established.
This change introduces new metrics:
inbound_http_authz_allow_total
inbound_http_authz_deny_total
inbound_tcp_authz_allow_total
inbound_tcp_authz_deny_total
inbound_tcp_authz_terminate_total
allow metrics include
target_addr
,srv_name
, andsaz_name
labels. deny and terminate metics include only
target_addr
andsrv_name
labels.Authorization denials are no longer reflected in inbound_tcp_error or
inbound_http_error metrics.
A number of internal changes have been made to support this:
inbound::policy::authorize
module includes middlewares for TCPand HTTP authorization, replacing the prior method of enforcing policy
in the stack/router. This module ensures that metrics are recorded for
policy decisions.
error-metrics
crate has been removed. In its place amonitor
type has been added to the
stack
crate, supporting a general way toobserve errors, decoupled from the metrics registry.
outbound crates, respectively. Inbound- and outbound-specific error
types are also moved into their rspective crates.
app_core::errors
module has been updated to only define thetypes it needs to instrument the error response layer. Error responses
are now primarily instrumented via the
HttpError
type so that errorsthat should be handled can be configured where the error is thrown.
The error type now holds an underlying source error so that the error
metrics layer can see through this wrapper type to track the
underlying error cause.
maps. We currently read only the
name
label from each; and thislabel is required.